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Abstract 



^ ' We present a new class of multivariate binning-free and nonparametric goodness- 

' of-fit tests. The test quantity energy is a function of the distances of observed and 

simulated observations in the variate space. The simulation follows the probability 
I distribution function /o of the null hypothesis. The distances are weighted with a 

■ weighting function which can be adjusted to the variations of /q. We have investigated 

the power of the test for a uniform and a Gaussian distribution of one or two variates, 
^ , respectively and compared it to that of conventional tests. The energy test with a 

Q ■ Gaussian weighting function is closely related to the Pearson test but is more pow- 

^ . erful in most applications and avoids arbitrary bin boundaries. The test is especially 

D I powerful in the multivariate case. 

> 

X 1 Introduction 

There exist a large number of binning-free goodness-of-fit tests for univariate distributions. 
The extension to the multivariate case is difficult, because there a natural ordering system is 
missing. The popular test on the other hand suffers from the arbitrariness of the binning 
and from lack of power for small samples. 

The test proposed in this article avoids both, ordering and binning. It is especially 
powerful in the multivariate case. It is called energy test, because the definition of the 
test statistic is closely related to the energy of electric charge distributions. To facilitate 
the computation of the test statistic, we approximate the probability distribution function 
(p.d.f.) /o of the null hypothesis by simulated observations. The statistic "energy" can also 
be used to test whether two samples belong to the same parent distribution. 

In Section 2 we define the test and study its properties. Section 3 contains a comparison 
of the power of the energy test with that of other popular tests in one dimension. In Section 
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4 we apply the energy test to two-dimensional problems. We discuss possible modifications 
and extensions of the test in Section 5 and conclude with a summary in Section 6. 



2 The test statistic 
2.1 The energy function 

We define a quantity 0, the energy, which measures the difference between two p.d.f's /o(x) 
and /(x), X G M", by 



The integrals extend over the full variate space. The weight function i? is a monotonically 
decreasing function of the Euklidian distance |x — x'|. Relation with R = l/|x — x'| is 
the electrostatic energy of two charge distributions of opposite sign which is minimum if the 
charges neutralize each other. Setting the function R = 5(x — x'), where (5(x — x') is the 
Dirac Delta function, reduces to the integrated quadratic difference of the two p.d.f.s 



Since we want to generalize (0) in such a way that we can apply it to a comparison of a 
sample with a distribution or of two samples with each other, (ps is not suitable. Functions 
like Gaussians which correlate different locations have to be used. 

The test which we will introduce is based on the fact that for fixed /o(x), is minimum 
for the null hypothesis /(x) = /o(x) for all x G M*^. We sketch a prove of this assertion in 
the Appendix. 

Expanding ((T)) 



we obtain three terms which have the form of expectation values of R. They can be estimated 
from two samples Xi, X2, X^r and Yi, Y2, Ya/ drawn from / and /o, respectively. Since 
in the first and the second term a product of identical distributions occurs, there it is not 
necessary to draw two different samples of the same p.d.f.. The expectation value of R in 
these expressions is given by the mean value of R computed from all combinations of the 
sample observations 
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Throughout this article unless specified differently, all sums run from 1 to the maximum 
value of the index. For the energy statistic (pN of a sample Xi, X2, Xjy relative to the 
p.d.f. /o is given by: 

^N = J;^^J^^Y.m^^-^,\) + I J /o(x)/o(xOi?(|x-x'|)dxdx'+ 
-^E / /o(x')i?(|x.-x'|)dx'. 

To test whether a statistical sample Xi, X2, X^v of size is compatible with the null 
hypothesis Hq, we could in principle use the test statistic 0Ar of the sample with respect to 
the associated p.d.f. /o according to 07v but since the evaluation of (f)^ usually requires a 
sum over difficult integrals, we prefer to represent /o by a sample Yi, Y2, Ym, usually 
generated through a Monte Carlo simulation. We drop in 4>nm the term depending only on 
/o which is independent from the sample observations and obtain: 

j>i i,j 

Furthermore, we have replaced the denominator of the first term 1/N{N — 1) by 
which has superior small sample properties (see Appendix). Statistical fiuctuations of the 
simulation are negligible if M is large compared to A^, typically M > ION. 



2.2 The weight function 

We have investigated three different types of weight functions, power laws, a logarithmic 
dependence and Gaussians. 



— — for T ^ d 

Rpow{r) = <! for ^ < ^ (3) 



/ X _ / - In r for r > dmm //, x 

Rcir) = exp {-ry{2s^)) (5) 

The first type is motivated by the analogy to electrostatics, the second is long range and 
the third emphasizes a limited range for the correlation between different observations. The 
power K, of the denominator in (jH)) and the parameter s in (jSl) may be chosen differently for 
different dimensions of the sample space and different applications. For slowly varying /o a 
small value of k around 0.1 is recommended. For short range variations the test quantity 
with larger values around 0.3 is more sensitive. 

Also the logarithmic function Riog is well adapted to slowly varying /q. The corresponding 
test quantiles are invariant under the transformation r — > ar. For more rapidly varying /q, 
the Gaussian weight function Rg is recommended. It permits to adjust the range of the 
smearing to the shape of /q. 
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The inverse power law and the logarithm exhibit a pole at r equal to zero which could 
produce infinities in the double discrete version of the energy function (pNM- Very small 
distances, however, should not be weighted too strongly since deviations from /q with sharp 
peaks are not expected and usually inhibited by the finite experimental resolution. We 
eliminate the poles by introducing a lower cutoff c/min for the distances r. Distances less than 
(imin are replaced by dmin- The value of this parameter is not critical at all, it should be of the 
order of the average distance of the simulation points in the region where /o is maximum. 

2.3 The distribution of the test statistic 




^ (t) 



Figure 1: Energy distributions for a Gaussian weighting function (left) and a logarithmic weighting 
function (right) and their approximation by extended extreme value distributions. 

The distribution of <^ depends on the distribution function /o and on the weight function 
R. Figure ^ shows the distributions for uniform /o with Gaussian and logarithmic weight 
functions. The distributions are well described by a generalized extreme value distribution 

depending on three parameters, a scale parameter cr, a location parameter /x, and a shape 
parameter ^. Rather than computing these parameters from the moments of the specific (f) 
distributions, we propose to generate the distribution of the test statistic and the quantiles 
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by a Monte Carlo simulation. As a consequence of the dramatic increase of computing 
power during the last decade, it has become possible to perform the calculations on a simple 
PC within minutes. There is no need to publish tables of percentage points, to distinguish 
between simple and composite hypotheses and also censoring can be incorporated into the 
simulation. 



2.4 Relation to the Pearson test 

Let us assume that we have in a certain x^-bin rij experimental observations and rrii Monte 
Carlo observations and (3 = M/N ^ 1. Then the contribution of that bin is 

~ m,//3 

nf — 2nimi/f3 + ■mf/jS'^ 
mi/13 

For a goodness-of-fit test, an additive constant is irrelevant. Thus we can drop mf in the 
last relation. If in addition the theoretical distribution is uniform and the bins have constant 
size, we can also ignore the denominator. 

Xi - 2 ^ 

Up to a constant factor this last relation corresponds nearly to the energy defined in Q for 
a weight function which is constant inside the bin and zero outside: 

_ riiirii - 1) riimj 1 2/ 
cPnm - iVM iV^^' 

The test apphed to a uniform distribution /q is equivalent to an energy test with a 
box shaped weight function and fixed box locations. 

Replacing the box function by the Gaussian weight function, the sharp cut at the bin 
boundaries and the arbitrary location of the boundary for fixed binning are avoided but the 
idea underlying the test is retained. 



2.5 Optimization of the test parameters in a simple case 

To study the dependence of the power of the test on the choice of the weight function and 
its parameters, we have chosen for Hq a uniform, univariate p.d.f. /o restricted to the unit 
interval [0, 1]. We determined the rejection power with respect to contaminations of /o with 
a linear and two different Gaussian distributions which represent a wide and a more local 
distortion. 

/o = l 

h{X) = 2X (6) 
/2(X) = C2exp(-64(X -0.5)2) 

h{X) = caexp (-256(X - 0.5)^) (8) 
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We required 5% significance level and computed the rejection power which is equal to one 
minus the probability for an error of the second kind. 

As a reference, we also computed the power of a test with bins of fixed width. The 
number of bins B ^ 2N'^/^ was chosen according to the prescription proposed in 'SJ. 

The cut-off parameter d^in for i?^ (power law) and Ri (logarithmic) was set equal to 
l/(4iV). 

In Figure 121 we show the results for samples of 100 observations for 70 % contamination 
with /i, 30 % contamination with /2, and 20 % contamination with f^. Five different values 
of the Gaussian width parameter s, four different power laws and the logarithmic weight 
function have been studied. 

As expected, the linear distribution is best discriminated by slowly varying weight func- 
tions like the logarithm, low power laws and the wide Gaussians. 

The contamination with the narrow Gaussian /s is better recognized by the narrow weight 
functions (s = 1/8, k = 0.3). Here the two wide Gaussian weight functions fail completely. 

Figure El illustrates the dependence of the rejection power on the sample size for the three 
different distortions (0) of the uniform distribution. The amount of contamination 

was reduced with increasing sample size. 

We applied the energy tests with power law /t = 0.3, the logarithmic and two Gaussian 
weight functions with fixed width s = 1/8 (Gfix) and variable width (Gvar). The variable 
width was chosen such that the full width at half maximum is equal to the bin width, 
chosen according to the 2N'^/^ law. This allows for a fair comparison between the two 
methods. 

As expected the linear distribution is best discriminated by the energy test with loga- 
rithmic weight function. The power of the test is considerably worse. The energy test 
with variable Gaussian weight function follows the trend of the test but performs better 
in 14 out of the 15 cases. 

Comparing the samples with sizes 50 and 100, respectively, we realize one of the caveats 
of the test: For the sample of size 50 there are 9 bins. The central bin coincides favorably 
with the location of the distortion peak of the background sample and consequently leads to 
a high rejection power. For the sample of size 100, however, there are 12 bins, thus two bins 
share the narrow peak and the power is reduced. The Gaussian energy test is insensitive to 
the location of the distortion. 

3 Comparison with alternative univariate tests 

We have investigated the following goodness-of-fit tests, some of which have been designed 
for special applications and can be found in Jj: test, Kolmogorov test, Kuiper test, 
Anderson-Darling test, Watson test, Neyman smooth test. Region test. Energy tests with 
logarithmic and Gaussian weight functions. The region test has been developed to detect 
localized bumps by partitioning the variate range of the order statistic into three regions 
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Figure 2: Rejection power of different energy tests for a uniform distribution /q with respect to a 
linear distribution and two Gaussian contaminations, exp[— 64(x — 0.5)^] and ea;p[— 256(a; — 0.5)^]. 
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Figure 3: Power of tests for 3 different contaminations to uniform distribution and 5 different 
sample sizes ranging from 10 to 200. The sliape of the contamination is displayed on top of the 
columns. The type of test is indicated in the lowest left hand plot. 
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Figure 4: Background distributions used to contaminate the uniform distribution. 



with expected probabihties pi,pj,l — pi — Pj. For A^j and Nj observations observed in the 
first two regions, the test statistic is 

R = sup{(iVi - Npif + (iVj - Np^f + [Ni + - N{pi + pj)f 

where Ni is the number of observations with x < Xi. 

Samples contaminated by different background sources were tested against Hq, corre- 
sponding to the uniform distribution. The power of each test for a 5 % significance hmit was 
evaluated from 10^ Monte Carlo simulations containing 100 observations each. 

Samples with background contamination by a linear distribution, Gaussians and the 
following distributions displayed in Figure 0] were generated by 



bA{x) cc (1 — x] 



, , X 1 J- for < X < 0.5 
bB[x) cc I for 0.5<x< 1 

bcix) cc {x - 0.5)^ 



The power of the different tests is presented in Figure El As expected, none of the tests 
is optimum for all kind of distortions. The energy tests are quite powerful independent of 
the background function. 



4 The energy test for two or more variates 

While numerous goodness-of-fit tests exist for univariate distributions in higher dimensions 
only tests of the type have become popular. 
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Figure 5: Rejection power of tests with respect to different contaminations to the uniform distri- 
bution. The contaminations correspond to the distributions shown in the previous figure. 
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The extension of binning-free tests based on linear rank statistics to the muhivariate 
case p; is difficuh because no obvious ordering scheme of the observations exist. Sequential 
ordering in the different variates depends on the sequence in which the variates are chosen. 
In addition to this logical difficulty there is a more fundamental problem: The re-shuffling 
of the observations destroys the natural metric used to display the data. The situation in 
goodness-of-fit problems is very similar to that in pattern recognition. The possibility to 
detect distortions of the expected distribution by background or resolution effects depends 
on an appropriate selection of the random variables. The natural choice is usually given 
by the experiment. By mapping a multivariate space onto a space defined by the ordering 
recipe much of the information contained in the original distribution may be lost. 

4.1 Test for bivariate Gaussian distribution 

It is rather common to deal with samples of observations which are drawn from a multivariate 
normality. Multivariate goodness-of-fit tests have been developed mostly for multinormality. 
An overview can be found in [T]. An extension of the moment tests introduced by K. 
Pearson to two variates is Mardia's test |3]. Skewness jSi and kurtosis (32 are compared to 
the nominal values for the normal distribution, (3i = and = 3 respectively. We compared 
the logarithmic and the Gaussian energy tests to Mardia's tests and to the two-dimensional 
Neyman smooth test [?]. 

We chose a standard Gaussian 

f, = l-exp{-ix' + y')/2) 

to represent the null hypothesis. 

Background samples added to /o are shown in Figure IHl In addition, a uniform back- 
ground in the region (0 < X,Y < 1) and variations of the Gaussian background were 
investigated. All Gaussians have the same width in both coordinates X and Y, but differ in 
the value of the width and the correlation coefficient. 

Figure [7| displays the results. We were astonished how well the energy test competes with 
alternatives especially designed to test normality. We attribute the excellent performance of 
the energy test to the fact that it is sensitive to all deviations of two distributions whereas 
the Mardia tests are based only on two specific moments. 

4.2 Example from particle physics 

Figure |H1 is a scatter plot of data obtained in the particle physics experiment HERA-B at 
DESY. The positions of tracks from decays are plotted against the momentum of the ip 
meson. The 20 square entries represent the experimental data, the black dots correspond to 
a Monte Carlo simulation. 

We have computed the energy statistic with the logarithmic weighting function for 
the experimental data relative to the Monte Carlo prediction. Separately, the energy was 
determined for 100 independent Monte Carlo samples of 20 events each to estimate the 
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Figure 6: Background distributions used to contaminate a two-dimensional Gaussian. 

distribution of under Hq. The experimental value is larger than all Monte Carlo values 
(see Figure IHI), indicating that the data do not follow the prediction. 

5 Extensions and variations of the energy test 

5.1 Normalization to probability density 

The Gaussian energy test has proven to be similar but superior to the test for the uniform 
distribution. For non-uniform p.d.f.s the similarity can be retained by effectively dividing 
the elements of the sums in (jSj) by the p.d.f. /q. 

i?(|x. -x,|) i?'(|x. -x,|) = 

If the density is only given by a Monte Carlo simulation, the density can be estimated from 
the volume of a multidimensional sphere in the variate space containing a fixed number of 
Monte Carlo observations. In the example of Figure |H1 the density at x could be estimated 
from the area A(x) of the smallest circle centered at x containing 10 Monte Carlo points: 
/o(x) ^ 10/A(x). 

5.2 Width of the weighting function R and metric 

Some statisticians recommend to use equal probability bins for the test in one dimension. 
An equivalent procedure for the Gaussian energy test would be to adjust the width of the 
Gaussian weighting function to the p.d.f. 

Rs{rij) R'sirij) = exp (-rV(2sjSj)) 
Si a l//o(xi) 
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Figure 7: Rejection power of tests with respect to different contaminations of a two- 
dimensional Gaussian. The uniform background is denoted by U, the Gaussian contamination 
by N{mean, st.dev.). 



Such an adjustment can be applied independent of the dimension of the variate space 
while an equal probability binning is rather difficult in two or higher dimensions. 

The resolution of R should be adjusted to the expected deviations. The constants s can 
be chosen differently for the different coordinates. Alternatively, a variate transformation 
may be performed before the test is applied. 

The multi-dimensional logarithmic energy test requires a sensible choice of the metric. 
The distances of the observations should be about equal numerically in all directions of the 
variate space. This can be achieved by normalizing the variates to the square root of their 
variance. 

5.3 Clustering of observations 

For very large samples, the computation of may become excessively long even with powerful 
computers. A possible solution to this problem is to combine observations to clusters. The 
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Figure 8: Comparison of experimental distribution (squares) with Monte Carlo simulation (black 
dots). The experimental energy computed from the scatter plot (left) is compared to a Monte Carlo 
simulation of the experiment (right). 



cluster weight is equal to the sum of the observations included in the cluster and its location is 
their center of gravity. The details of the clustering are not important. The simplest method 
consists in combining points in bins formed by a simple grid. Another possibility could be the 
following: Observations and clusters (once clusters have been formed) are chosen randomly 
and combined with all points and clusters which are within a fixed maximum distance. The 
process is terminated when the number of clusters is below an acceptable limit. 



5.4 Are two samples drawn from the same distribution? 

A frequently occurring problem is to check whether two random samples belong to the same 
unknown parent distribution. In the framework of the "energy" concept we can use bootstrap 
or permutation methods to deduce a reference energy distribution. The energy computed 
according to (j2I) is then compared to this distribution. Quantiles can be used to measure the 
compatibility of the samples. 



6 Summary and conclusions 

Energy tests represent a powerful alternative to conventional tests. To our knowledge the 
energy test is the first multi-variate binning-free test which is independent of a subjective 
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ordering system. With a logarithmic weighting function it is very sensitive to long range dis- 
tortions of the distribution to be tested, a situation which is frequent in physics applications. 
With a Gaussian weighting function the energy test has similar properties as tests of the 
type, but avoids arbitrary bin boundaries. It is more powerful than the test in almost 
all cases which we have considered. It can cope with arbitrarily small sample sizes. It is 
astonishing how well the energy test can compete with the Mardia test for testing normality 
in two dimensions. 

The necessity to determine the distribution function of the test statistic for the specific 
application is not a serious restriction with today's computing power. 

The energy statistic can be used to test whether different samples belong to the same 
parent distribution. A corresponding study will be presented in a forthcoming article. 

7 Appendix 

7.1 Continuous case 

We conjecture that /(x) = /o(x) corresponds to a minimum of (/> in (|H). Substituting 
5f(x) = /(x) - /o(x) we obtain 

0=^11 ^7(x)^7(xOi?(|x - x'Ddxdx' > (9) 

where g fulfils the condition J 5f(x)dx = 0. We conjecture that is minimum for 5f(x) = 
for weight functions R which decrease continuously with |x — x'|. 

In the following, we sketch a prove of this conjecture without full mathematical rigor and 
generality. 

The property Q is invariant under a linear transformation 

R B! = aR + const 

where a is a positive scaling factor. Assuming that R is finite, we can restrict its values to 
the range — 1 < -R < 1 with -R(O) = 1. We approximate the function g hj a histogram with 
function value gi in bin i. The size of each bin is A". 

with self-explaining notation. The weights can be substituted by cosines, cos dij = Rij with 
cosOii = 1 in our approximation. The sum then can be written as a sum of vectors gj in M": 



(ph = A^" ^ gigj cos % 




The minimum of (ph is realized only if all gj are equal to zero. 
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7.2 Discrete case 



When we approximate both densities / and /o by distributions of points each, with weights 
equal we require that the energy be minimum if the two samples coincide, Xj = Vj. 
To fulfil this condition, we have to replace the factors 1/(A^ — 1) and 1/(M — 1) by in 

j>i j>i 
id 

To demonstrate the minimum condition which leads to (pNN = 0, we apply an infinitesimal 
shift to one observation. For x^ = for j ^ k and x^ — y^ = 5xfc, only the pair Xfc,yfc 
contributes to the energy, all other terms cancel. 

<pNN = -^[R{\S^k\-Rm 

Since R decreases with its argument, -R(O) — -R(|5xjt|) > 0, we have found a local minimum 
of the energy. 

For the special choice where R is the distance function, i. e. increasing with the distance, 
it has been demonstrated by ^ that 

^ = ^Y1 - + - y^l] - ]^ I] I] 1^^ -yj\<o 

j>i i j 

is maximum for Xj = y^ for all j. Since (f) is constant for constant R, it is plausible that for 
discrete distributions (p is minimum for x^ = y^ for all decreasing functions of the distance, 
and maximum for all increasing functions of the distance, however, we did not succeed in 
finding a general prove of this conjecture. 
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